# Project Overview

This repository contains code for the conducting main analyses in _Diagnosing Physician Error: A Machine Learning Approach to Low-Value Health Care_ (Mullainathan, Obermeyer). These scripts were written to run on Partners servers using a cohort of emergency room (ED) visits between 2010 and 2015. The codebase in its current form was put together by Cassidy Shubatt (<cshubatt@gmail.com>) and draws on several previous iterations of the project, particularly the work of Advik Shreekumar who conducted similar analyses in Medicare claims data.

## Cohort and Medical History Features
These analyses take as inputs a cohort table with our outcomes of interest, as well as a large features dataset that is used to train our models. The outcomes of interest and medical features are all constructed from raw medical codes: ICD-9 codes for diagnoses and procedures, LOINC codes for labs, etc. The mappings we used to translate these codes into useful variables can be found in the `lib/xwalks` directory. The diagnosis and procedure codes that comprise our main outcomes (stress test, catheterization, MACE, stent, and CABG) are detailed in `xwalks/stress_test_codes.yml`. The rest of the files in `xwalks` were used to build our modeling features -- for more information, see `lib/xwalks/README.md`.

## Code Structure
In general, codebase is set up to be run in order of the numbers at the beginning of the directory/script. For example, the directory `01_build_models` should be run before the directory `02_prep_and_summarize_cohort`. While numbering system offers some intuition on the order in which scripts should run, each directory contains a `Makefile` (and usually a corresponding `config.mk` file) which explicitly specifies script targets and dependencies. To recreate all analyses, it is sufficient to run the command `make` sequentially in each directory (and when relevant, sub-directory). You will notice that the files you see in a given directory will not contain the substantive scripts, but shell scripts which call corresponding R scripts from the subdirectory `scripts`. This idiosyncratic structure is a feature of logsitics on the Partners server where we ran these analyses.  All package dependencies used in this project are stored in the conda environment `stressr`, which can be built using `lib/stressr.yml`.
